{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "568fd96a-8cf6-42aa-b9cf-74b7aa383595",
   "metadata": {},
   "source": [
    "# Ollama Website Summarizer\n",
    "## Scrape websites and summarize them locally using Ollama\n",
    "\n",
    "This script is a complete example of the day 1 program, which uses OpenAI API to summarize websites, altered to use techniques from the day 2 exercise to call Ollama models locally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9502a0f-d7be-4489-bb7f-173207e802b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import ollama\n",
    "import requests\n",
    "from bs4 import BeautifulSoup\n",
    "from IPython.display import Markdown, display\n",
    "\n",
    "MODEL = \"llama3.2\"\n",
    "\n",
    "# A class to represent a Webpage\n",
    "# If you're not familiar with Classes, check out the \"Intermediate Python\" notebook\n",
    "\n",
    "# Some websites need you to use proper headers when fetching them:\n",
    "headers = {\n",
    " \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36\"\n",
    "}\n",
    "\n",
    "class Website:\n",
    "\n",
    "    def __init__(self, url):\n",
    "        \"\"\"\n",
    "        Create this Website object from the given url using the BeautifulSoup library\n",
    "        \"\"\"\n",
    "        self.url = url\n",
    "        response = requests.get(url, headers=headers)\n",
    "        soup = BeautifulSoup(response.content, 'html.parser')\n",
    "        self.title = soup.title.string if soup.title else \"No title found\"\n",
    "        for irrelevant in soup.body([\"script\", \"style\", \"img\", \"input\"]):\n",
    "            irrelevant.decompose()\n",
    "        self.text = soup.body.get_text(separator=\"\\n\", strip=True)\n",
    "        \n",
    "# A function that writes a User Prompt that asks for summaries of websites:\n",
    "\n",
    "def user_prompt_for(website):\n",
    "    user_prompt = f\"You are looking at a website titled {website.title}\"\n",
    "    user_prompt += \"\\nThe contents of this website is as follows; \\\n",
    "please provide a short summary of this website in markdown. \\\n",
    "If it includes news or announcements, then summarize these too.\\n\\n\"\n",
    "    user_prompt += website.text\n",
    "    return user_prompt\n",
    "    \n",
    "# Create a messages list for a summarize prompt given a website\n",
    "\n",
    "def create_summarize_prompt(website):\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": \"You are an assistant that analyzes the contents of a website \\\n",
    "and provides a short summary, ignoring text that might be navigation related. \\\n",
    "Respond in markdown.\" },\n",
    "        {\"role\": \"user\", \"content\": user_prompt_for(website)}\n",
    "    ]\n",
    "\n",
    "# And now: call Ollama to summarize\n",
    "\n",
    "def summarize(url):\n",
    "    website = Website(url)\n",
    "    messages = create_summarize_prompt(website)\n",
    "    response = ollama.chat(model=MODEL, messages=messages)\n",
    "    return response['message']['content']\n",
    "    \n",
    "# A function to display this nicely in the Jupyter output, using markdown\n",
    "\n",
    "def display_summary(url):\n",
    "    summary = summarize(url)\n",
    "    display(Markdown(summary))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "037627b0-b039-4ca4-a6d4-84ad8fc6a013",
   "metadata": {},
   "source": [
    "## Pre-requisites\n",
    "\n",
    "Before we can run the script above, we need to make sure Ollama is running on your machine!\n",
    "\n",
    "Simply visit ollama.com and install!\n",
    "\n",
    "Once complete, the ollama server should already be running locally.\n",
    "If you visit:\n",
    "http://localhost:11434/\n",
    "\n",
    "You should see the message Ollama is running."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c2d84fd-2a9b-476d-84ad-4b8522d47023",
   "metadata": {},
   "source": [
    "## Run!\n",
    "\n",
    "Shift+Enter the code below to summarize a website.\n",
    "\n",
    "### NOTE!\n",
    "\n",
    "This will only work with websites that return HTML content, and may return unexpected results for SPAs that are created with JS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "100829ba-8278-409b-bc0a-82ac28e1149f",
   "metadata": {},
   "outputs": [],
   "source": [
    "display_summary(\"https://edwarddonner.com/2024/11/13/llm-engineering-resources/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffe4e760-dfa6-43fa-89c4-beea547707ac",
   "metadata": {},
   "source": [
    "Edit the URL above, or add code blocks of your own to try it out!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
